Local Clustering in Provenance Graphs (Extended Version)

نویسندگان

  • Peter Macko
  • Daniel Margo
  • Margo Seltzer
چکیده

Systems that capture and store data provenance, the record of how an object has arrived at its current state, accumulate historical metadata over time, forming a large graph. Local clustering in these graphs, in which we start with a seed vertex and grow a cluster around it, is of paramount importance because it supports critical provenance applications such as identifying semantically meaningful tasks in an object’s history and selecting appropriate truncation points for returning an object’s ancestry or lineage. Generic graph clustering algorithms are not effective at producing semantically meaningful clusters in provenance graphs. We identify three key properties of provenance graphs and exploit them to justify two new centrality metrics we developed, specifically for use in performing local clustering on provenance graphs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved COA with Chaotic Initialization and Intelligent Migration for Data Clustering

A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...

متن کامل

A First Study on Clustering Collections of Workflow Graphs

As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different...

متن کامل

Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance

Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance Daniel Zinn Bertram Ludäscher {dzinn,ludaesch}@ucdavis.edu Abstract. Provenance graphs capture flow and dependency information recorded during scientific workflow runs, which can be used subsequently to interpret, validate, and debug workflow results. In this paper, we propose a new concept, called abstract provenance g...

متن کامل

Temporal Provenance Model (TPM): Model and Query Language

Provenance refers to the documentation of an object’s lifecycle. This documentation (often represented as a graph) should include all the information necessary to reproduce a certain piece of data or the process that led to it. In a dynamic world, as data changes, it is important to be able to get a piece of data as it was, and its provenance graph, at a certain point in time. Supporting time-a...

متن کامل

Named Graphs as a Mechanism for Reasoning About Provenance

Named Graphs is a simple, compatible extension to the RDF abstract syntax that enables statements to be made about RDF graphs. This approach is in contrast to earlier attempts such as RDF reification, or knowledge-base specific extensions including quads and contexts. In this paper we demonstrate the use of Named Graphs and our experiences developing new kinds of semantic web application that b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013